feat(store): LocalTieredStore — Tier-1 SQLite + Tier-2 Parquet (epic #540 phase 3b)#545
Merged
Merged
Conversation
…540 phase 3b) Second slice of Phase 3 of epic #540. Adds a real tiered storage implementation that ships the analytics value (cross-run DuckDB / Polars queries) without yet replacing the canonical .iafbt bundle. Layout under <root>: index.sqlite Tier-1 (always in sync) bundles/<handle>.iafbt canonical bytes parquet/portfolio_snapshots/run_id=<h>/... Tier-2 hive-partitioned parquet/trades/run_id=<h>/... Tier-2 parquet/orders/run_id=<h>/... Tier-2 Phase 3b deliberately keeps the bundle as the canonical representation; Tier-2 sidecars are auxiliary, written best-effort, and a malformed sidecar never blocks a write or a read. This trivially preserves byte-identical Backtest round-trips today. Byte-identical Tier-2 -> Backtest reassembly (no bundle on the read path) is Phase 3d. - decompose.py: Backtest -> flat record lists for snapshots / trades / orders, adding run_id and window_name columns so downstream tools group cleanly across walk-forward windows. Extension point for metric_series and any future kind is the DATASETS tuple. - LocalTieredStore: implements BacktestStore + SupportsCopyFrom. write() saves the bundle, upserts the Tier-1 row, and writes hive-partitioned Parquet sidecars per dataset. delete() removes all three tiers. iter_index_rows() serves from SQLite directly. rebuild_index() recreates Tier-1 from the bundles (useful after a software upgrade that adds new index columns). - scan('portfolio_snapshots' | 'trades' | 'orders') returns a pyarrow.dataset.Dataset that DuckDB / Polars can query across every run with partition pruning on run_id. - 15 new tests: Protocol + SupportsCopyFrom conformance, three-tier layout, handle normalisation, round-trip, summary_only, Tier-1 always-in-sync (write/delete/len), Tier-2 cross-run scan, copy_from from LocalDirStore, rebuild_index, missing-handle errors. Includes a synthetic-records test that asserts hive partitions are written and that scan() returns the expected rows + columns. Targeted suite (backtest_store + backtest_index + cli): 101 / 101 passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second slice of Phase 3 of epic #540 — adds a real tiered storage implementation that delivers the cross-run analytics value while keeping the canonical
.iafbtbundle as the source of truth (full Tier-2-as-canonical lands in Phase 3d).What's in this PR
Storage layout
The bundle is the canonical representation. Tier-1 and Tier-2 are derived, eagerly maintained, and best-effort: a malformed sidecar never blocks a write or read against the bundle. This keeps Phase 3b's invariants simple and trivially preserves byte-identical
Backtest.save_bundle/Backtest.openround-trips today. Byte-identical Tier-2 → Backtest reassembly (no bundle on the read path) is Phase 3d.Modules
decompose.py—Backtest→ flat record lists for snapshots / trades / orders, addingrun_idandwindow_namecolumns so downstream tools group cleanly across walk-forward windows. Extension point formetric_seriesand any future kind is theDATASETStuple.LocalTieredStore— implementsBacktestStore+SupportsCopyFrom.write()saves the bundle, upserts the Tier-1 row, and writes hive-partitioned Parquet sidecars per dataset.delete()removes all three tiers.iter_index_rows()serves from SQLite directly.rebuild_index()recreates Tier-1 from the bundles (useful after a software upgrade that adds new index columns).Cross-run analytics
Or directly from DuckDB:
Partition pruning on
run_idis automatic — DuckDB scans only the relevant directories.Tests
15 new tests, all passing:
SupportsCopyFromconformance.write()..iafbtsuffix stripped).algorithm_id;summary_onlyhonoured.iter_index_rowsafter writes;deleteremoves all tiers;__len__uses the index.Datasetwithrun_idcolumn; unknown dataset name raises.copy_fromfromLocalDirStore(interop with Phase 3a).rebuild_index()recreates Tier-1 from bundles.StoreHandleNotFoundError.scan()returns the expected rows + columns when records are present.Targeted suite (
tests/services/backtest_store/+tests/services/backtest_index/+tests/cli/): 101 / 101 passing.What's still coming in Phase 3
ohlcv,code,params,symbols) with SHA-256 dedup — where the 64 GB → 20 GB headline livesiaf migrate-store --from local-dir --to local-tiered; byte-identical Tier-2 →Backtestreassembly (.iafbtbecomes export-only); the parameterised test fixture that runs every backtest test against both stores